Publishing the Trove Newspaper Corpus
نویسنده
چکیده
The Trove Newspaper Corpus is derived from the National Library of Australia’s digital archive of newspaper text. The corpus is a snapshot of the NLA collection taken in 2015 to be made available for language research as part of the Alveo Virtual Laboratory and contains 143 million articles dating from 1806 to 2007. This paper describes the work we have done to make this large corpus available as a research collection, facilitating access to individual documents and enabling large scale processing of the newspaper text in a cloud-based environment.
منابع مشابه
Manipulative Propaganda Techniques
Influencing the public attitude towards certain topics had become one of the strongest weapons in today’s information warfare. The ability to recognize a presence of propaganda in newspaper texts is thus a treasured phenomenon, which is not directly transferable to algorithmic analysis. In the current paper, we present the first steps of the project aiming at detection and recognition of select...
متن کاملEnglish and Persian Sport Newspaper Headlines: A comparative study of linguistic means
Abstract Using rhetorical figures in specialized languages like the language of newspaper headlines is common. The present study attempted to conduct a contrastive analysis of the English and Persian sport newspaper headlines related to the 2014 FIFA World Cup. Toward this end, a corpus consisting of 400 English and 400 Persian headlines published during 12th of June to 13th of July, 2014 was c...
متن کاملEnglish and Persian Sport Newspaper Headlines: A comparative study of linguistic means
Abstract Using rhetorical figures in specialized languages like the language of newspaper headlines is common. The present study attempted to conduct a contrastive analysis of the English and Persian sport newspaper headlines related to the 2014 FIFA World Cup. Toward this end, a corpus consisting of 400 English and 400 Persian headlines published during 12th of June to 13th of July, 2014 was c...
متن کاملReflection of Knowledge and Information Science’s News in the Press: A Case Study of Iran Newspaper
Background and Aim: The present study aims to explore the coverage and reflection of Knowledge and Information Science news in the Iranian press. Iran Newspaper which is one of the main public newspapers in the country has been selected as the case for this study. Method: This study used content analysis as its research methodology and adopted an inductive approach in data analysis. All the pag...
متن کاملNew production models for newspaper organizations
Information delivery is undergoing profound changes. The established media such as radio, television, and newspapers are faced with a variety of new digital content formats. New dimensions of publishing can be exploited in the case of newspaper production. This paper investigates the changes in the production models of newspaper organizations caused by the introduction of information technology...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016